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Abstract. We consider the minimization problem of ^-divergences between a given probability 
measure P and subsets f2 of the vector space Mj- of all signed finite measures which integrate a 
given class T of bounded or unbounded measurable functions. The vector space A4jr is endowed 
with the weak topology induced by the class T U B\> where B\> is the class of all bounded measur- 
able functions. We treat the problems of existence and characterization of the ^-projections of 
P on f2. We consider also the dual equality and the dual attainment problems when f! is defined 
by linear constraints. 
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1. Introduction and notation 

Let (X, B) be a measurable space and P be a given reference probability measure (p.m.) on 
(X, B). Denote M. the real vector space of all signed finite measures on (X,B) and M(P) the 
vector subspace of all signed finite measures absolutely continuous (a.c) with respect to (w.r.t.) P. 
Denote also A4 1 the set of all p.m.'s on (X, B) and M 1 (P) the subset of all p.m.'s a.c w.r.t. P. Let 
if be a propeiQ closed^ convex function from ] — oo, +oo[ to [0, +oo] with ip(l) = and such that its 
domain dom<^ := {i £ R such that tp(x) < oo} is an interval with endpoints a v < 1 < b v (which 
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■'"We say a function is proper if its domain is non void. 

2 The closedness of ip means that if a v or b v are finite numbers then f>{x) tends to ip(a v ) or tp{b v ) when x \. a v 
or x f b v , respectively. 
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may be finite or infinite). For any signed finite measure Q in M(P), the (^-divergence between Q 
and P is defined by 



(1.1) 



x 



^-(x) 

dp [ ' 



dP(x) 



When Q is n ot a.c. w.r.t. P, we set 4>(Q, P) = +00. The (^-divergences betwe en p.m. ' s were 
introduced bv lCsiszar (1963) as "/-divergences". The definition of (^-divergences of Csiszar (1963) 
between p.m.'s requires a common dominating er-finite measure, noted A, for Q and P. Note that 
the two definitions of 4>— divergences coincide on the set of all p.m.'s a.c w.r.t. P and dominated 
by A. The (^-diverg ences between any signed finite measure Q and a p.m. P were introduced by 



Csiszar et ali (|1999f k they gave the following definition 



(1.2) 



where a := lim., 



, <p(x)/x, 



P) := 

b := lim 
Q = qP 



ip(q) dP + ba+(X)-a<7Q(X), 



+00 an d 



'Q Q 

is the Lebesgue decomposition of Q, and the Jordan decomposition of the singular part ctq, re- 
spectively. The definitions (jl.ip and (jl.2[> coincide when Q is a.c. w.r.t. P or when a = —00 or 
b = +00. Since we will consider optimization of Q 1— > <fi(Q, P) on sets of signed finite measures a.c. 
w.r.t. P, it is more adequate for our sake to use the definition (|1.1[) , 



For all p.m. P, the mappings Q e M ^ <f>(Q,P) are convex and take nonnegative values. 
When Q = P then <f)(Q, P) — 0. Furthermore, if the function x <-> f(x) is strictly convex on a 
neighborhood of x = 1, then the following basic property holds 



(1.3) cj)(Q, P) = if and only if Q = P. 

All th ese properties are presented in lCsiszar ( 19631 ). Csiszar ( 1967al ). Csiszar ( 1967b ) and Liese and Vaidal 
(jl987l ) chapter 1, for (^-divergences defined on the set of all p.m.'s Ai 1 . When the (^-divergences 
are defined on M, then the same properties hold. 



When defined on Ai 1 , the Kullback-Leibler (KL), modified Kullback-Leibler (KL m ), \ 2 i mod- 
ified x 2 (Xm)j Hellinger (H), and L\ divergences are respectively associated to the convex func- 
tions ip(x) = xlogx — x + 1, <p(x) = — \ogx + x — 1, ip(x) — Tj(x — 1) , <p(x) = t}(x — 1) /x, 
ip(x) = 2(y/x — l) 2 and <p(x) — \x — 1 1 . All those diverg e nces except th e L\ one, belong to the 



class of power divergences introduced in lCressie and Read! 1 
chapter 2). They are defined through the class of convex functions 

x 1 — "fx + 7 — 1 



(see also lLiese and Vaidal (|1987[ ) 



(1.4) 



x g]0,+oo[h> (fij(x) := 



7(7- 1) 

if 7 6 K \ {0, 1}, ^0(2;) : = — logx + x — 1 and fi{x) := a; log a; — x + 1. (For all 7 e R, we define 
<p 7 (0) :— lim^^o tp-yfa))- So, the K L— divergence is associated to ipi, the KL m to ipo, the % 2 to ip2, 
the Xm to (f-i and the Hellinger distance to </?i/2- 

The Kullback-Leibler divergence (ifL-divergence) is sometimes called Boltzmann Shannon relative 
entropy. It appears in the domain of large deviations and it is frequent ly used for reconstru ction 
of laws, and in particular in the classical moment problem (see e.g. ICsiszar et all (|l999t ) and 
the references therein). The modified Kullback-Leibler divergence (_ftTL, m -divergence) is sometimes 
called Burg relative entropy. It is frequently used in Statistics and it leads to efficient methods 
in statistical estimation and tests problems; in fact, the celebrate "maximum likelihood" method 
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can be seen as an optimization problem of the -fT-L m -divergence between the discrete or con tinu- 
ous parametric model and the e mpirical measure associated to the data; see Keziou (2003a) and 
Broniatowski and KeziovJ (|2003l) . On the other hand, the recent "empirical likelihood" method can 
also be seen as an optimization problem of the i"CL m -divergence between some set of measures sat - 
isfying some linear constrai nts and the em pi rical measure associ ated to the data: see lOwen (2001) 
and the references therein, Bertail (2003), Bertaill ( 2004 ) and Broniatowski and Kezioun 20041 ). 
The Hellinger divergence is also used in Statist i cs, it l eads to robust s t atistical methods in para - 
met ric and semi-parametric models; see Beran ( 1977 ). Lindsay ( 1994 ). Jimenez and Shaol ( 2001 ) 
and iBroniatowski and KeziovJ (|2004r ). 



We extend the definition of the power divergences functions Q € Ai 1 i-> ^ 7 (Q,P) onto the whole 
vector space of signed finite measures M. via the extension of the definition of the convex functions 
p 1 : For all 7 € R such that the function x H> p~/(x) is not defined on ] — 00, 0[ or defined but not 
convex on whole R, we extend its definition as follows 

ify(x) if x E [0, +00 [, 
+00 if x G] — 00, 0[. 



-OOH 



(1.5) xe}-' 

Note that for the x 2 -divergence for instance, (p%(x) :— \{x— l) 2 is defined and convex on whole R. 



The conjugate (or Fenchel-Legendre transform) of p will be denoted p* , i.e., 
(1.6) i£li->^*(t) := sup {tx — p(x)} , 

and the endpoints of domp* (the domain of p*) will be denoted a v * and b v * with a v * < b v * 
that ip* is proper closed convex function. In particular, a v * < < 6 V », p*(0) = and 



(1.7) 



lim 



<p(v) 



b v * 



lim 



<p(v) 



Note 



i/-!--oc y y^t+oa y 

By the closedness of ip, the conjugate p** of p* coincides with ip, i.e., 
(1.8) (p**(t) :=sup{tx-p*(x)} = p(t), foralUeK. 

For the proper convex functions defined on R (endowed with the usual topology) , the lower semi- 
continuit}0 and the closedness properties are equivalent. 



We say that ip (resp. <p*) is differentiable if it is diffcrcntiable on ]a v ,b v [ (resp. ]a v « , b v * [), the 
interior of its domain. We say also that p (resp. p*) is strictly convex if it is strictly convex on 



b^l (resp. 



lip* , v<p 



The strict convexity of p is equivalent to the condition that its conjugate p* is essentially smooth, 
i.e., differentiable with 



(1.9) 



-00 
-00 



if 
if 



v < 



-00, 
-co. 



lim t4 . atp » p*'(t) = 
lim ttv p*'(t) = 

Conversely, p is essentially smooth if and only if p* is strictly convex; see e.g. iRockafellarl (|l970h 
section 26 for the proofs of these properties. 



If p is differentiable, we denote p' the derivative function of p, and we define p'(a v ) and p>'(b v ) 
to be the limits (which may be finite or infinite) lim^a^ p'(x) and lim x -f(, p'{x), respectively. We 
denote Imp' the set of all values of the function p' , i.e., Imp' :— {p'(x) such that x € [0^,6^]}. 



^We say a function tp is lower semi-continuous if the level sets {x such that ip(x) < a}, a G M are closed. 
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If additionally the function tp is strictly convex, then ip' is increasing on [a^,^]. Hence, it is 
one-to-one function from [0^,6^,] to lmtp', we denote in this case tp' 1 the inverse function of tp' 
from Im</?' to [a v ,b v \. 

Note that if ip is differentiable, then for all x €]a v , b v \, 

(1.10) <p*(<p'{x)) = xtp'(x) - tp(x). 
If additionally ip is strictly convex, then for all t G lvs\tp' we have 

(1.11) tp*(t) = tip'~ l {t)-ip(ip'~ l {t)^ and tp*'(t) = <p'~\t). 

On the other hand, if ip is essentially smooth, then the interior of the domain of tp* coincides with 
that of Imp', i.e., (a v *,b v *) = (tp' (a v ) , tp' (b v )) . 

The domain of the (^-divergence will be denoted dom0, i.e., 

(1.12) dom</> := {Q G M such that <p(Q, P) < 00} . 

Definition 1.1. Let Q be some subset in M.. The (^—divergence between the set fl and a p.m. P. 

noted 4>(Cl, P), is 

0(0, P) := iaf <f>(Q,P). 

Definition 1.2. Assume that 0(0, P) is finite. A measure Q* £ fl such that 

4>{Q*,P) <<t>{Q,P) for all Qen 
is called a (^-projection of P onfl. This projection may not exist, or may be not defined uniquely. 

If tp is a strictly convex, then the function Q E M(P) H> <fi(Q,P) is strictly convex, and the 
0-projection of P on some convex set Q is uniquely defined whenever it exists. 

Let gi : X 1— > R, i = 1, . . . , I, be measurable real valued functions on X . Denote g := (go, g\, . , . , gi) T 
with 50 := l^r- We assume that the functions go, g\, . . . , gi are linearly independent in the following 
sense : P {\ T g(x) ^ 0} > for any A G IR 1+/ with A ^ 0. For all A G we denote A , Ai, . . . , A/ 

the (1 + I) coordinates of A. 

Let's denote by M g the set of all signed finite measures with total mass one, a.c. w.r.t. P, which 
integrate the functions gi and satisfy a finite number of linear constraints, i.e., 



(1.13) 



M g :=^QeM(P) such that Q(X) = 1 and J gi (x) dQ(x) = 0,i = 1, . . . , zj 



We consider the optimization problem 

(1.14) inf (j)(Q,P). 

The Lagrangian "dual" problem associated with (|1.14[> is 

(1.15) sup (a - / tp*(\ T g(x)) dP(x) 

We will consider the problem of the "dual" equality inf (jl . 14[) = sup fll.isp . the existence of optimal 
solutions in (|1.15|) , and in particular the problems of the existence and the characterization of the 
optimal solutions in (|1.14j) . i.e., the ^-projections of P on the set M g . 
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These problems intervene in the domain of the reconstruction of laws, in particular, the classi- 
cal moment problem. Also they appear frequently in Statistics; in fact, the recent "empirical 
likelihood" method, which is the non parametric version of the celebrate maximum likelihood 
method, can be seen as an optimization problem of the A'Lm-divergence between some set of mea- 
sures defined as in (jl.131) and the empirical measure associated to the data. 

In the vocabulary of the duality theory, a measure Q in M g which realizes the infimum in (|1.14|) 
(i.e., a ^-projection of P on M g in the vocabulary of ^-divergences theory) is called "a primal 
optimal solution" or simply "an optimal solution" , while a point A in K 1+ ' realizing the supremum 
in (|1.15|) is called "a dual optimal solution" . 

For the optimization problem of convex function ip : M" n-] — oo, +00] on convex sets C in M. n 
subject to linear constraints Ax = b e R' m where A is some m x n-matrix, a sufficient condition, 
in order that the equality 

(1.16) inf ijj(x) = sup {b T t-ijj* (A T t)} 

{x£C; Ax=b} teR m 

holds with dual attainment, is that th ere exists a point x in the relative interioi@ of the convex set 
C n doim/> such that Ax = b. See e.g. Rockafellar ( 197Clh for the proofs of these results. 



In order to make the set M g closed and the linear functions Q G A4 <— > f x gi(x) dQ(x) con- 
tinuous (which we need to apply the duality theory and to treat the problem of existence of 
^-projections of P on the set M g ), we endow the vector space M. by the weak topology which 
we denote Tjr induced by T U Bb where T := {go, g%, . . . , gi} and Bb is the set of all bounded im- 
measurably real valued functions on X; sec section 2 below for precise definition of the r^-topology. 

Not e that the relative i nterio r of the convex set M g is generally empty in the weak topology 



Borwein and Lewis] (1992) have extended the idea of the relative interior (r.i.) of convex sets 



in R" to a new notion which have called "the quasi relative interior" (q.r.i.) of convex subsets of 
an arbitrary Hausdorff topological vector space X (having finite or infinite dimension), and they 
used it to construct a powerful duality theory for the optimization problem of convex function 
ip '. X 1 y (—00, +00] on convex sets C C X subject to linear constraints. In particular, when 
X is locally convex, they obtain s imilar results as in (11.161) when the relative interior is replaced 



by the quasi relative interior; see iBorwein and Lewis! (|l992l ) Corollary 4.8. The main advantage 



of the quasi relative interior of convex subset C of infinite dimension vector space X is that it is 
frequently nonempty even when the relative interior of C is empty. 

If J x \gi(x)\ dP(x) is finite for all z = 1,...,Z, then the convex conjugate of the convex func- 
tion Q i->- (j>(Q,P) (on the vector space Mj^(P) of all signed finite measures Q a.c. w.r.t. P and 
which integrate all the elements of J- ', i.e., all the functions gi) can be written as 

cjy* (f) := sup if fdQ~ <f>(Q, P)\ = ftp* (f) dP, for all /e(JU B 6 ); 
QeMr(P) U J J 

see section 4 be low for details. So, in th is case, as in IBorwein and Lewis! (|l99ll) . we can apply 



Corollary 4.8 of IBorwein and Lewis! (|1992[ ) to obtain the dual equality inf (|1 . 14[) = sup (|1.15[) with 
dual attainment, whenever there exists a measure Qo in M g which belongs to the quasi relative 
interior of M a Q dom </). This condition is called "constraint qualification" . We can prove also from 

¥> ^ dP — dP 



Borwein and Lewis ( 1992 ) that a measure Qo is in the q.r.i of M„ndom(/> iff a v < < ^rfe- < b,. 



4 i.e., the interior in the real affine subspace (C n donw/)) of M n endowed with the relative topology of the usual 
topology on W 1 . 
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P-almost everywhere (P-a.e.). We summarize these results and some other results about the prob- 
lems of the existence and the characterization of the primal optimal solution (i.e., the 0- projection 
of P on M a ) in the followin g two Theorems and two Corollaries . For proofs, see Theorem 3.10 
of Borwein and Lewis! (1992 ) . Coro llary 2.6 and Theorem 4.8 of Borwein and Lewis (1991), and 
Theorem II.2 of ICsiszar et all (| 19991) . 



Theorem 1.1. If f x \gi(x)\ dP(x) is finite for all i = 1, . . . , I, and if the following constraint 
qualificatioi^: 



(1.17) there is a Q G M„ (1 dorruf) such that a v < — — < — — <b ip (P — a.e.) 



dQ < dQ 
dP ~ dP 

holds, then inf = sup (|1.75p and there is attainment in \1.15\) . Suppose additionally that ip* 

is essentially smooth (which is equivalent to the strict convexity of if), and that there exists a dual 
optimal solution A which is an interior point of 

(1.18) domxj)* := |a £ R 1+l such that J tp*(\ T g(x)) dP{x) is finite 

Then the unique optimal solution of J 1.11$ (i.e., the (^-projection of P on M g ), which we denote 
by Q* , exists and it is given by 

(1.19) ^(x)=<p*'(X T g(xj). 

In (|1.18|i . for brevity, the definition of domcf>* , which usually is the set of functions / such that 
</>*(/) < oo, is modified here. 

Remark 1.1. // all functions gi belong to L oc (X , P) , and if for a dual optimal solution A S 
the following condition 

—T t 

(1.20) a v » < essinf A g{.) < esssupA g(.) < b v * 

holds, then A is an interior point of dovrujf ■ Hence, under assumption hl.20\) . all results in the 
above Theorem hold whenever the constraint qualification {Llty is met. 

If all functions gi belong to L oa (X , P), and the convex function ip* is everywhere finite (i.e., 
a v » — —oo and b v * = +oo), then obviously condition (jl.20l) holds since dom0* = M 1+/ in this case. 
Hence, under the constraint qualification (j!.17[) . all results in the above Theorem hold. We state 
this result in the following Corollary. 

Corollary 1.2. Suppose that all functions gi belong to L oa (X , P) and ip* is everywhere finite (i.e., 
a v * = — oo and b v - — +00/ If the constraint qualification holds, then inf \ 1. 1$ = swo ^l.l5\ 

and there is attainment in H1.15\) . Suppose additionally that ip* is everywhere differentiable (which 
is equivalent to the strict convexity of ip), then the unique optimal solution Q* of jl-14\ ) (i.e., the 
(j) -projection of P on M g ) exists and it is given by 

(1-21) f(i)=/(A T jW), 
where A € 1R 1+ ' is any dual optimal solution. 



5 The strict inequalities in JTTT7t mean that P < a v \ = P ||S > \,\ = 0. 
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In the following Theo rem and Corollary, we give sufficient conditions for the uniqueness of the dual 
optimal solution (see Borwein and Lewisl ( 1991 ) Theorem 4.5 for the proof). Note that the strict 
convexity of p* is equivalent to the condition that its conjugate ip is essentially smooth. 



Theorem 1.3. Suppose that all assumptions of Theorem \1.1\ are satisfied. Suppose furthermore 
that the function p> is essentially smooth. Then the dual optimal solution X is unique. Moreover, 
the unique optimal solution Q* of \1. 1J$ exists and it is given by 



(1.22) 



dQ* 
dP 



(x) = tp- 



ip' 1 ( X T g{x) 



Corollary 1.4. Suppose that all assumptions of Corollary are satisfied. Suppose additionally 
that the function ip is essentially smooth. Then the dual optimal solution A is unique. Moreover, 
the unique optimal solution Q* of \1. 1J$ exists and it is given by 

dQ* 

(1.23) 



dP 



■(x) = p*' (\ T g{x)) = ip 1 ' 1 (x T g(x) 



The important Corollary II. 2\ which essentially requires that the constraint qualification (|1.17p 
holds, ap plies in the if L- dive r gence case since the corresponding conjugate p* is everywhere finite 
(see also iBorwein and Lewisl (1993) for other examples), but it fails in the two important cases 
of Burg relative entropy (if L m -divergence in the context of divergences) and Hellinger divergence 
without additional conditions since the corresponding conjugates p* are infinite on the intervals 
[1, +oo) and [2, +oo), respectively. 



Leonard! (|2001bl) consider the optimization problem (| 1 . 14[) when the set M g is replaced by the 
subset 



(1.24) 

where L v * 
(1.25) 



M os := |q g M(P) such that q := ^ G L vV >, J g(x) dQ = (1,0, . . ., 0) 1 

is the Orlicz space denned as follows: 

Lip** '■= {q '■ X —> R; measurable such that ||<?|| v ** < oo} 

with \\q\\^ := inf L > 0; J p% O^^j dP ( x ) < X } . 



ix \ a 

and p™ is the convex conjugate of the convex function <p* m defined by p*n(t) := max (cp*(t), p>*(—t)) 
for all t G M. Without the constraint qualification (|1.17|) . under the following integrability condition 



(1.26) for any A G M. 1+t , / ip* {X T g(x)) dP(x) < oo, 

Jx 

applying the duality theory on Orlicz spaces, iLeonardl (|2001bh obtains the dual equality 



(1.27) 



inf 

Q£M 



,P) 



sup <; A 



X 



p* (X T g(x)) dP(x) 



More over, if the value is finite, then there exists at least one ^-projection of P on M os ; see Theorem 
3.4 of lLeonardl (|2001bh for details in more general context. A characterization of the (^-pr ojections 
of P o n the set M os (with finite or infinite number of linear cons traints) i s stated by ILeonard 
(|200ld) under condition (fl~26|) ; see Theorems 4.4, 4.5 and 4.6 of ILeonardl (|200ld ). Note that 
the integrability condition (|1.26[) implies that p* is everywhere finite, and these results apply in 
the important K L-divergence case with finite or infinite number of linear constraints. However, 
the condition (|1.26p does not hold in the if L m -divergence and Hellinger divergence cases since the 
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domains of the corresponding ip* functions are proper subsets of R, and the important result (|1.27|) 
does not apply in these two important cases. Under the weaker integrability assumption 



(1.28) 



x 



for any A G R 1+i , there exists a > such that 
ip* (a\ T g{x)) dP{x) + [ ip* (~a\ T g(x)) dP{x) < oo, 



x 



the dual equality (|1.27p may fail; see Theorem 3.3 of iLeonardl (|2001bh . 



The goal of the present paper is to give results of existence and characterization of the ^-projections 
of a given p.m. P on some subsets of M.p, the space of all signed finite measures which integrate a 
given class J- of functions, in particular, convex sets of signed finite measures defined by linear con- 
straints as in (|1.13p extending some previous works (ab out the e xistence and chara c terization of th e 
^-projections on su b sets of M. 1 , the set o f all p.m. 'si oflCsiszar ( 1975 ). Liese ( 1977 ). Csiszar ([l984 ). 
Ruschendorl Jl984 i. iRiischendorj (|l987l ). iLiese and Vaidal (|l987t > . iTeboulle and Vaidal (jl993l) and 
Csiszarl (|1995l ). We give also different versions of dual representations of the (^-divergences viewed 
as convex functions on the space of all signed finite measures which integrate an arbitrary class 
of functions. When the set is defined by linear constraints as in (I1.13[) . we consider the dual 
problem, and we obtain the equality inf (I1.14[ ) = sup (|1.15[) with dual attainment under different 
assumptions without constraint qualification. Additional conditions are given to obtain similar 
results which apply in the two important i4TL m -divergence and Hellinger divergence cases. 

Enhancing Ai 1 to Ai is motivated by the following arguments: sometimes the ^-projection, say Ql, 
of a p.m. P on subset of Ai 1 is not an "interior" point and we can not give in this case a definite 
description of Ql, while the (^-projection, say Q* , of a p.m. P on subset of Ai is an "interior" 
point, which allows to give a perfect characterization of the (^-projection Q* (see example ) 3. II) . 
In the context of statistical estimation and tests using the empirical like lihood method ( sec Owen 
(2001)), or related ones to criterions defined through divergences (see iBroniatowski and Keziou 
(|2004l )). the projection of the empirical measure P n of a sample on a set fl 1 of p.m.'s may make 
problems when the projection is not an interior point of ft 1 n dom0(., P n ). Enhancing Ai 1 to Ai, 
this difficulty does not hold any longer, and tests as well as estimation can be performed. 

The rest of this paper is organized as follows : In section 2, we consider the problem of exis- 
tence of ^-projections on general closed sets of signed measures. In section 3, we deal with the 
problem of characterization of the (^-projections on sets of signed measures, in particular, sets of 
signed measures defined by linear constraints. In section 4, we give different dual representations 
of (^-divergences seen as convex functions on the vector space of all signed finite measures which 
integrate a given class of functions. In section 5, we apply the results of sections 2, 3 and 4, to 
obtain the dual equality inf (|1.14[) = sup (|1.15j) with dual attainment, under different assumptions 
without constraint qualification. 



2. Existence of ^-Projections on Sets of signed measures 

In this section, we give sufficient conditions for the existence of ^-projections of some p.m. P on 
sets f2 of signed finite measures which integrate some class of functions (see Theorems 12.51 12.61 
and !2.7[ and Corollary 12.81 below) . At first, we give some notation and we establish a convenient 
topological context for this problem. Let T be some class of measurable real valued functions / 
(bounded or unbounded) defined on X . Here, T is not assumed to be finite. Denote by Bb the set 
of all bounded measurable real valued functions defined on X , and by (J- U Bb) the linear span of 
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F U Bb- Define the set 



and the real vector space 



M)p := I.Q e M 1 such that J \f\ dQ < oo, for all / in J"| , 
|q € such that ^ |/| d|Q| < oo, for all / in J"| , 



in which \Q\ denotes the total variation of the signed finite measure Q. 
Note that if F = B b , then M]r = M 1 and M T = M. 

Definition 2.1. Denote by Tjr the weakest topology on M.jr for which all mappings Q € Mjr t— > 
J f dQ are continuous when f belongs to F U Bb ■ Denote also by tm the weakest topology on 
(F U Bb) for which all mappings f G (F U Bb) J f dQ are continuous when Q £ M.jr. We 
sometimes call Tjr the topology induced by (F U Bb) on M.?, and likewise tj^ the topology induced 
by Mr on (F U B b ) . 



A base of open neighborhoods for any R in A^jr is defined by 



(2.1) 



U(R, A, e) := < Q € M. t such that max 
1 feA 



f dR- / f dQ 



< e 



for e > and A a finite collection of functions in (F U Bb) ■ 



We refer to Chapter 5 of Dunford and Schwartz (1962), for the various topologie s induced by classes 



of fun ction s. Note that the class Bb induces the so-called r-topology (see e.g. iGroeneboom et al 
(Il979h and iGan sslcr ( Il97lh . and that M B b is the whole vector space M . 



The above ry— topology on Mjr is indeed the natural and the most convenient one in order 
to handle projection properties. It has been introduced in the context of large deviation prob- 
abilities by lEichelsbacher and Schmockl (12002]) for the Kullback-Leibler diverg e nce a nd it is used 
i n Stat istics in Broniatowskil (|2003l) . iKezioul ( 2003a ). Broniatowski and Keziou ( 2003 ) and Keziou 
( 2003bf ). Usually the sets which are to be considered in statistical applications are not compact but 
merely closed sets; a typical example is when they are defined by linear constraints as in (|1.13j) . 
Hence, the set M g is closed in M.jr endowed with the Tjr-topology if the functions gi (which may 
be bounded or unbounded) belong to T\ this motivates the choice of T7r-topology. 



Proposition 2.1. Equip M.? with the Tjr-topology and (J- (J Bb) with the T^-topology. Then, 
Mjr and (J- U Bb) are Hausdorff locally convex topological vector spaces. Further, the topologi- 
cal dual space of M.jr is the set of all mappings Q n J f dQ when f belongs to (J- U Bb), and 
the topological dual space of (FUBb) is the set of all mappings f j f dQ when Q belongs to M.?. 



Proof of Proposition 12.11 By Lemma 5.3.3 in Dunford and Schwartz! (1962), the vector space 
Aijr equipped with the rjr-topology is a Hausdorff locally convex topological space. On the other 
hand, the set of all mappings Q € M> J f dQ when / belongs to (F U Bb) is a total linear 
space; indeed, for all Q G A4j^, assume that J f dQ — for all / in < F U Bb >, choose / = 1{b} 
for any B € B to conclude th at Q = 0. The proof ends then as a consequence of Theorem 5.3.9 in 
Dunford and Schwartz! (|l962l) . ■ 



We denote by [Aij^; Tjr} and by [(F U Bb); tm] the two Hausdorff locally convex topological vector 
spaces endowed with the r^-topology and the tm -topology, respectively. 
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Broniatowski and Kezioul ( 2003 ) have proved that the function Q e [M 



jr; tjt] h> (f>(Q, P) is lower 
semi-continuous (l .s.c.), provided only that the c orresponding convex fu nction is clos ed; see 
Proposition 2.3 of iBroniatowski and Keziou ( 2003 ) and Proposition 2.1 of iKezioul (|2003al) which 
we recall here for convenience. 



Proposition 2.2. For any (p- divergence, the divergence junction Q i— > (f>(Q,P) from [Mj; rjr] to 
[0, +oo] is l.s.c. 



We will use the following Lemma to prove Proposition 12.21 

Lemma 2.3. Let AAr{P) denotes the vector subspace of all signed measures in Aijr which are 
absolutely continuous w.r.t. P. The vector subspace Mj^(P) is a closed set in [.Mj^tjf]. 



Proof of Lemma l2.3I Lct Aijr(P) denotes the closure of Aij^(P) in [Aij^; tjf]. Assume that there 
exists R in VWjf(-P) with R not in M.jr{P). Then, there exists some B in B such that P(B) = and 
R(B) 7^ 0. On the other hand, for all n in N, the set U := U (i?, 1{b}, is a neighborhood of 
R (see (|2.1j) ). hence, U n A4j^(P) is non void. Therefore, we can construct a sequence of measures 
R n in Mj^(P) such that 



t {B} dR- / t {B} dR 



< 1/n. 



Since R n (B) = for all n in N, we deduce that R{B) = 0, a contradiction. This implies that 



Mjr(P) — Mjr(P), that is Mj^(P) is closed in [M^; tjt}. This concludes the proof of Lemma |2~3 



Remark 2.1. Note that if T = Bb, then = M and Mj^(P) = M{P). Hence, we deduce 
from Lemma \2.3\ that the subspace Ai{P) is closed in [M.] t], the space of all signed finite measures 
endowed with the r-topology. Note also that Aijr and A4jr(P) are closed in [Aij^]Tjr\, and that 
M 1 and M 1 (P) are closed in [M; r] . 

Proof of Proposition [2721 Let a be a real number. We prove that the set 

A{a) := {Q € M T such that <j)(Q, P) < a} 

is closed in [A4 jf;tjf]. By Lemma [2.31 Aij-(P) is closed in [M.jr-,Tjr\. Since A{a) is included in 
[Aij-(P); rjr], we have to prove that A(a) is closed in the subspace [Aij-(P); tjt\. Let 



B(a) 



f e Li(X,P) such that J ip(f(x)) dP{x) < aj 



B{a) is a convex set, since if is a convex function. Furthermore, B(a) is closed in Li(X,P). 
Indeed, let /„ be a sequence in B(a) with linin^oc /„ = /*, where the limit is intended in L>i(X ', P). 
Hence, there exists a subsequence f nk which converges to /* (P-a.e.). The functions <p(f nk ) are 
nonnegative. Further, we have liminffe^. +00 <p{fn h {%)) — f*{x) (P-a.e.) by the closedness of the 
convex function ip. Therefore, Fatou's Lemma implies 

f tp(f) dP < /liminf <p(f nh ) dP < liminf [ <p(J n „) dP < a, 

J J k— >-+oo k— >+oo J 

which is to say that /* belongs to B(a). Hence, B(a) is a closed subset in Li (X , P). Since, it 



is convex, it is then weakly closed in Li(X,P); see e.g. Theorem 5.3.13 in lDunford and Schwartz! 
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(|1962n . Denote by W the weak topology on L\{X, P) and consider the mapping H defined by 

H : [MAP)\tA h> [In{X,P);W\ 
Q i-> H(Q) = dQ/dP. 

Let us prove that H is weakly continuous, that is Q i— > J H(Q)g dP is a continuous mapping for 
all g in L CXD (X, P). Indeed, let g be some function in L 00 (A' ) P). Then, we have 



J H(Q)g dP = J {dQ/dP)g dP = J g dQ. 



The mapping Q h- » J g dQ is r^-continuous; indeed, for all g in L oa (X ) P), it holds P(g > WgW^) = 
0, which implies Q(g > WgW^) — 0, for all Q in Mj^(P). Therefore, / g dQ = J 9^-[g<\\g\\ ] dQ. 
Now, the mapping Q i— > J ffl[ g <|| g || i dQ is continuous in r^-topology since gl[g<\\ g \\ ]&J- U Bb- 
Since A(a) = {Q € Mjr(P), 4>{Q,P) < a} = H^ 1 (B(a)), we deduce that A(a) is closed in 
[A4j?(P); tjt], for any a in R. This proves Proposition 12 . 21 M 



For any (^-divergence, by the lower semi-continuity of the function Q € [A^j^jt^] h- > <fi(Q,P), 
the following result holds. 

Theorem 2.4. Lei P fee some p.m. and 17 some compact subset of [Mj?;tf]. Then there exists 
at least one ^-projection of P onfl. 



Using some similar argumen ts as used in the proof of Theorem 2.4 in iLiesel dl977h or Proposition 



5 in lLiese and Vajdal ([19871 ) and Fenchel's inequality or Holder inequality, we state general results 



for the existence of (^-projections of some p.m. P on closed sets f2 of [M?; tjf] (see Theor em 12.6 



and 12. 71 below). At fi rst, in the following The orem, we give a version of Theorem 2.4 in lLiesd (|197 



or Proposition 8.5 in lLiese and Vaidal (|19870 . 



Theorem 2.5. Let fl be some closed set in [M;t]. Assume that the following two conditions 

(2.2) P) := krf </>(Q, P) < od3 

and 

(2.3) lim ^M = +oo 

|a;|->oo |x| 

hold. Then there exists at least one (^-projection of P on fi. 

Proof of Theorem 12. 5I Denote m := 4>(£l, P) which is finite by assumption, and let be a positive 
number. Define the sets 

fl(l3) := {Q e fl such that <f>(Q ', P) < m + 0} 

and 

A(/3) := { q := — such that Q e M0) 
[ dP 

The set M0) is u niformly integrable. Hence, it is weakly sequentially compact in Li(X,P), (see 
e.g. Meverl (1966) p. 39). Consider now a sequence Q n in fi(/3) such that 



lim </)(Q n ,P) = 4>(n,P). 



n— >+oo 



^Note that this is equivalent to the condition: there exists Q € O, such that </>(Q, P) is finite. 
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The sequence q n := dQ n /dP belongs to A(/3). Therefore, there exists a subsequence ((JViJigN 
which converges weakly in L\(X,P) to some function q* € Li{X,P), which is to say that the 
corresponding sequence of signed finite measures Q ni converges to Q* € A4(P) in r-topology 
where Q* is defined by dQ* /dP := q*. Hence, Q* belongs to ft since it is the limit in r-topology 
of the sequence (Q ni ) which belongs to the r-closed set ft. On the other hand, the mapping 
Qe[M;r]^ </>(Q,P) is l.s.cQ, and therefore 



(2.4) 

We deduce that Q* is a 



lim 

— >+oo 



\P)< 

-projection of P on Q 



„p) = (/>(n, p) < oo. 



Remark 2.2. For sets f2 of p.m. 's defin ed by lin e ar con straints, sufficient conditions for the exis- 
tence of KL -projections are presented in XCsiszar !il97d) Theorem 3.1, Corollary 3.1 and Theorem 
3.3). Sufficient conditions of the existen ce of <j) -p r ojectio ns on sets of p.m. 's satisfying linear equal- 
ity or inequality constraints are given in Csiszdi {199 A ) Theorem 3. 



Remark 2.3. By \Eichelsbacher and SchmocA (200 A ), if for alia > and all f G T, J exp (a\f\) dP < 
oo, then the level sets 

{Q G M]f such that KL(Q, P) < c) 

are compact in LMjr; tjt] for all real c. Therefore, for any Tjr-closed set f2 C . MV f or which 
KL(Q,P) < oo, the projection of P on exists; see I Eichelsbacher and SchmocA (WO A ) Lemma 
2.1. 



Using Fenchel's inequality and some s imilar argumen ts to that in Lemma 2.1 of Eichelsbacher and Schmockl 
(|2002l ). We generalize Theorem 3 of ICsiszarl ([19951 ) and the result in Remark 12.31 about the exis- 
tence of projections, to the class of (^-divergences and to Tjr-closed sets of signed measures. 



Theorem 2.6. Let be some closed set in M.jr equipped with the Tjr-topology. Suppose that the 
following three assumptions 



(2.5) 



(2.6) 



<t>{n,p) < oo, 

lim — : — — = +oo 

|a:|-K» \x\ 



(2.7) and for every f 6 T and every a > 0, J ip* (a\f\) dP < oo 

hold. Then there exists at least one ^-projection of P on Q. 



Proof of Theorem 12. 61 As shown in the proof of Theorem l2.5[ under assumptions (|2.5[) and (|2.6p . 

there exists a sequence (Q ni )ieN m ^(/3) C ^ that converges in r-topology to some Q* in A4(P) 
satisfying 

(2.8) <f>{Q\P)< I™ 0(Q B< ,P) = 0(n,P)<oo. 

i— oo 



^this holds from Proposition 12. 2l choosing the class of functions T = £3;,, the class of all bounded measurable real 
valued functions. 



MINIMIZATION OF DIVERGENCES ON SETS OF SIGNED MEASURES 



13 



It remains to prove that Q* belongs to £1. At first, we prove that Q* belongs to .Mjf. So, let 
/ in J- . Denote by Q* + the nonnegative variation and by Q*_ the nonpositivc variation of Q*: 
Q* = Q*. — Q*_ . Using Fenchel's inequality through the integral we can write 



\f\dQ\ = 
< 
< 



W+ dP 

<p{q* + ) dP+ I <p*(\f\) dP 



(2.9) 

and similarly 
(2.10) 

Hence, from 



iptf) dP + J tp*(\f\) dP 
\P)+ J ¥>*(|/|) dP, 

J |/| dQ*_ <<f>(Q*,P) + J <P*{\f\) dP 
and (|2.10l) . we deduce J |/| d\Q*\ < oo since 



l/l d\Q*\ 



l/l dQ* A 



l/l dQl 



Hence Q* belongs to M.?. We still have to prove that Q* belongs to Q. Since fi is, by assumption, 
a closed set in [M.p; 7>], it is enough to show that the sequence {Q ni )i (which belongs to 0(/3) C Q) 
converges to Q* in [Mj^;tjt]. Note that the sequence Q ni converges to Q* in r-topology. Hence, 
we still have to prove that / / dQ ni converges to J / d Q* for all / in T . So, let / in T . W e use 
now similar argument as in the proof of Lemma 2.1 in lEichelsbacher and SchmoclJ ( 2002 ). Let 
e > 0. Dehne a = (m + (3)/e. Using the fact that tp*(0) = 0, by condition (|2.7p and the dominated 
convergence theorem, there exists jo € N such that 

\ J <P* («l/|l{|/|>i}) dP<e 

for all j > jo. Hence, using Fenchel's inequality and the fact that the sequence (Q ni )i belongs to 
f2(/3), we can write 



J f dQn z - J /l{|/|<i} dQ 7H < J 



\f - / 1 {|/i<j}| d \Qni\ 



- I "l/l 1 !!/!^} d \Qr- 



(2.11) 



< 2 



< 



ni ,P) + 



(m + (3) + . 



1 



V* H/|!{|/l>i}) dP 



= 4e. 



We have just proved that, for all / e J, for all e > 0, there exists jo G N, such that for all j > jo 
and all i € N, 

(2.12) J ft {lf \< 3} dQ nt - 4e < J f dQ ni < J fl{\ f \< j} dQ 7U + 4e. 

Using the fact that the sequence (Q ni )i converges to Q* in r-topology, by passage to limits in (|2.12l) 
when i — > oo, then when j — > oo and hnally when e — ^ 0, we get lim^oo J f dQ ni = J f dQ* . 
Hence, the sequence (Q ni )i converges to Q* in rjr-topology, which implies that Q* belongs to fi 
since fl is closed in [Aij?; tjt]. From the inequality (I2.8|) . we conclude that Q* is a ^-projection of 
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P on SI. This completes the proof. ■ 

Using Holder inequality, we give in the following Theorem another result of existence of (^-projection 
on closed set in [M.?; t?]. In the sequel, ||.|| fc denotes the usual norm of the vector space Lf,{X, P), 
1 < k < +00. 



Theorem 2.7. Let SI be some closed set in M.jr equipped with the tjt -topology. Assume that the 
following conditions 

(2.13) ct)(n,P)<oo, 



(2.14) there exists numbers 1 < r, k < +00 such that r 1 + k = 1, 

lim 7-|"r > 0- and for every f £ T . ||/|L < 00 

\x\-^ao \x\ 

hold. Then there exists at least one (j) -projection of P on f2. 



Proof of Theorem 12.71 Since condition (|2.14l) implies (|2.6|) . as in the proof of Theorem | 
there exists a sequence (QnJieN in f2(/3) C SI that converges in r-topology to some Q* in A4(P) 
satisfying 

(2.15) HQ*,P)< lim </>(Q„ jJ P) = 0(O,P)<oo. 

i— >+oo 

We have to prove that Q* belongs to SI At first, we prove that Q* belongs to Mjr. For all / in 
we have 

/ \f\d\Q*\ = J \f\\q*\dP 

= I l/lk*|!{k«|<co} dP + J |/|k*|l { | g .|>co} dP 

1/fe / . x 1/r 



< c / l/l dP+ / |/|_J^L_p( g *)V 1{|9 . |>co} rfP 
' <A<7*) 



(2.16) 



< c y l/l dP + Cl (J \f\ k dP^j ^J^(q*)dP 
= c J l/l dP + c^J \f\ k dP^j ' (HQ'\p)) 



Hence, from (|2.15j) and (|2.14|) . we deduce / |/| d\Q*\ < 00. We still have to prove that Q* 
belongs to SI. Since SI is, by assumption, a closed set in [M.j?;tf], it is enough to show that 
the sequence (Q ni )i (which belongs to f2(/3) C SI) converges to Q* in \M.j^;t^}. Note that the 
sequence Q ni converges to Q* in r-topology Hence, we still have to prove that J f dQ, H converges 
to J f dQ* for all / in T . So, let / in T . For all positive number b, using (|2.14l) . we can write 
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J / dQm = //!{|/|<b} d Qn, + J /1{|/|>6} dQm =: A + B, and 



151 = 



< / \f\^{\f\>b} d\Q ni \ = J \f\l {m>b} \q ni 



/1{|/|>&} dQ ril 



|/|l { |/|>6}kn i |l { | ?n( |< Co} rfP+ / l/|l{|/|>6}l9n,|l { | g „ J>Co} 



dP 



dP 



< co J |/|1 { |/|> 6} dP + J \f\l {m>b} k "' l lA ^) 1/r l {k „ J>Co} 



< 



dP 

'•«. / 1/ 11 ; /' m <IP + n I /|1 ii /■ M/(« m ) 1/r dP 

l/k / r \l/r 



We deduce 



< c y |/|1 { |/| >6} dP + cJJ |/| fc l { |/|> 6} dp) ^ p(g,J dP 
(2.17) (Bl) < y / dQ ni < (B2), 

/i{i/i< 6} ^ -c y i/ii { i/i>6} dp - c i(y i/i fci {i/i>f} dp ) 7 (y ^) 1 



with 
(Bl) 
and 



(B2) := y /1 { |/|< 6} dg Wi +c y |/|l { |/|> h} dP + cJj |/| fe l { |/|> 6} dp) ' (J ip(q ni ) dP^ ' 

The functions {fb := |/|1{|/|>6} 5 6 > 0} and {/* := |/| 1{|/|>&} 5 b > 0} are dominated respec- 
tively by |/| and |/| fc . Moreover, J |/| dP and J" |/| fe cZP are finite by assumption (|2.14l) . We thus 
get by the dominated convergence theorem 

lim f |/|1 { |/|> 6} dP = lim / |/| fc l { |/|> 6} dP = 0. 

Hence, from (|2.17|) . we get 

/ / dQ* = lim lim (Bl) < lim / / dQ ni < lim lim (Bl) = If dQ* , 

which is to say that the subsequence (Q ni ) i converges to Q* in rjr-topology. Hence, Q* belongs to 
SI. From inequality (|2.15[) . we deduce that Q* is a </>-projection of P on SI. This ends the proof of 
Theorem O ■ 



Note that the above results do not apply in the case of K L m and Hellinger divergences since the 
condition lim^i^oo = +oo does not hold. The following Corollary applies without assumption 

limi^i^oo = +oo, in particular, in the KL m and Hellinger divergences cases. 



Corollary 2.8. Let SI be a closed set in [M.,t\. If the following condition: there exists 

(2.18) u, I G LAX, P) such that u < — < I (P — a.e.) for all Q G Q H dom<j) 

dP 

holds, then there exists at least one ^-projection of P on SI whenever </>(Sl,P) is finite. 



Proof of Corollary 12. 81 Similar to that of Theorem l2.5l The uniform integrability of the set A(/3) 
holds by condition (|2~T5)) . ■ 
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3. Characterization of ^-Projections on sets of signed measures 

In this section, we extend known result s pertaining to the c h aracterization of t he ^- projections 
as can be fou nd in Riischendorf (1984), Riischendorl ( 1987 ). Liese and Vaida ( 19871 ). (see also 
Csiszar ( 1975t ) and Csiszar ( 1984 ) for the characterization of K L- projections). These authors have 
characterized the </>-projections on subsets of M. 1 . We expose similar results when considering 
subsets of M. and take the occasion to clarify some proofs. We first consider the case of general 
subsets £1 of M. and then the case of convex subsets of M. defined by linear constraints. For the 
whole Section, we assume that the convex function (p is differentiable. 



3.1. On general Sets fi. We will use the following assumption 

There exists < S < 1 such that for all c in [1 — 8, 1 
(3.1) we can find numbers c\,C2, C3 such that 

Lp(cx) < c\Lp(x) + C2 \x\ + C3, for all real x. 



■<*]> 



Remark 3.1. Condition 13.1]) holds for all power divergences including KL, KL m and Hellinger 
divergences. Note also that condition \3.1\) implies that a v equals or —00 and b v equals +00. 

Remark 3.2. In all the sequel, condition \3. 1\) above can be replaced by any other condition which 
implies part (1) of Lemma \3.1\ below. 



We first give two Lemmas, which we will use in the proof of Theorem 13.31 and Theorem 13.41 below. 

), P) is finite, we have 



Lemma 3.1. Assume that \3. 1\) holds. Then, for all Q in M. such that 
(1) for any c in [1 — 8, 1 + 8], if ^c^p^ belongs to L\(X ', P). 



(2) lim ct i <KcQ,P) 



, P) = lim 



ell < 



(cQ,P). 



Proof of Lemma 13.11 (1) Under condition (13. ip . for all Q in A4 such that 4>(Q, P) < 00, we have 

+ c 3 . 



C dP - ClLP {dP 



Integrating with respect to P yields 

dQ 
Z dP 



dP < a 



P) 



01 



(2) For all c in [1 — 5, 1 + 8], define the functions 

l r : x e K l c (x) 
9c(x) 



9c 



x e 
x e 
x e 





dQ 


C2 


dP 




dQ 


/ 


dP 




[cx)t 



dP + C3 < 00. 



i-> h c (x) 

For any c and x, we have (p(cx) = l c {x) + g c {x) + h c {x). For all real x, the functions c — > l c {x) 

dQ 
dP' 



= (p(cx)t [0A] (cx), 
= ip(cx)l] h+oo[ (cx). 



and c — > h c (x) are nondecreasing, and the function c — > g c (x) is nonincreasing. Denote q := ^p. 



Apply the monotone convergence theorem to get 

lim J l c {q) dP = J h{q) dP and lim J h c (q) dP = J h\{q) dP. 

On the other hand, the class of functions {x — > g c {x), c in [1 — 8, 1 + 8]} is bounded above by the 
function x — > gi-s(x). Furthermore, for all Q in M, gi-s(q) belongs to L\(X ', P) by the condition 
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(|3.1j) . Hence, applying the monotone convergence theorem we get 

hm J g c {q) dP = J 9l {q) dP. 

Those three limits prove the first part of the claim. The same argument completes the proof of 
the Lemma. H 

Lemma 3.2. Assume that condition i3. 1\) holds. Then, for all Q in donuj), ip'(q)q belongs to 
L X (X,P), where q := fp. 

Proof of Lemma 13. 21 Using the convexity of the function ip, for all e > 0, we have 

y(g) - - £ )g) < t, x < <^((l + e)g) 
e — — e 

By Lemma [3~T1 for all e satisfying < e < 5, both the LHS and the RHS terms belong to L\(X , P), 
and hence ip'(q)q £ L\(X,P). ■ 

Theorem 3.3. Let be a subset of M. and Q* be a signed measure in O n donuj). Then 

(1) The following are sufficient conditions for Q* to be a <j>- projection of P on f2; (i) (p'(q*)q £ 
Li{X,P) and (ii) J tp'(q*) dQ* < J <p'{q*) dQ, for all Q in fin donuj). 

(2) If condition US. 1}) holds and fl is convex, then these conditions are necessary as well. 



Proof of Theorem 13.31 Convexity and differentiability of tp imply, for all positive e, 

(3.2) <p (q ){q - q ) < < tp{q) - ip{q ). 

e 

The middle term in the above display, by the convexity of tp, decreases to ip'(q*)(q — q*) when 
e I 0. Furthermore, it is bounded above by <p(q) — <p(q*) which belongs to Li(X,P) for all Q in 
dom</>. Hence, applying the monotone convergence theorem to get 

(3.3) / p\q*){q - q*) dP = lim / £ ((1 = e)q * ± £g) = ^ dP, for all Q G dom0. 

Proof of part (1): Integrating (|3.2|) with respect to P and using (i) and (ii) in part (1) of the 
Theorem, we obtain for all Q in n dom</> 

(3.4) <j>(Q, P) - <j>{Q*,P) > J f'illil - Q*) dP = j <p'(q*) dQ~ J <p'(q*) dQ* > 0. 

Hence, Q* is a ^-projection of P on f2. Proof of part (2): Convexity of both f2 and dom<^, implies 
that for all Q £ n dom0, (1 — e)Q + eQ* belongs to f2 n domtj). Since Q* is a ^-projection of P on 
f}, for all Q £ ^ndorn^ and all e satisfying < e < 1, we get <f> ((1 - e)Q + eQ*, P) - </>(Q*, P) > 0. 
Combining this with (13.3[) and using the fact that Q* is a ^-projection of P on J7, we obtain for 
all Q in ft n dom</> 

= hni/^ ((1 ~ £)g * +eg) ~^ ) rfP 
40 7 e 

(3.5) = lim-[0((l-e)Q* + eQ,P)-0(Q*,P)] > 0. 

40 e 



On the other hand, integrating (|3.2|) with respect to P, we obtain for all Q in f2 n dom</ 
(3.6) / <p'(q*)(q-q*)dP<cj>(Q,P)- ( j)(Q*,P)<<x>. 



18 



MICHEL BRONIATOWSKI* AND AMOR KEZIOU** 



Hence, (|3.5p and (|3.6p imply 

(3.7) tp'(q*)(q-q*) Gii(Af,P), for all QeUn dom0. 

By Lemma l3~2l (p'(q*)q* G Li(X,P). Combining this with (|3.7[) . we obtain that 

for all Q G £1 H dom0, we have ip'(q*)q G £i(A", P) 

and ftp'(q*) dQ* < J tp'(q*) dQ. This completes the proof of Theorem [3731 ■ 

3.2. On Sets defined by Linear Constraints. In this subsection, we consider the problems of 
existence and characterization of 0-projections of some p.m. P on linear set S of measures in Ai 
defined by arbitrary family of constraints. So, let Q denote a collection (finite or infinite, countable 
or not) of real valued functions defined on (X,B). The class Q is assumed to contain the function 
\x- The set S is defined by 

(3.8) S := |q G Mg(P) such that J dQ = 1, J g dQ = 0, for all g in Q\ {1*} j . 
The following result states the explicit form of Q* , a (^-projection of P on S, when it exists. 



Theorem 3.4. 

(1) Let Q* be some finite measure in S D dome/). A sufficient condition, for Q* to be a eft- 
projection of P on S , is that there exists numbers C\ , . . . , C4 € K and functions g\ , . . . , gd £ 
G such that ip'(q*(x)) = c\g\{x) + • • • + c<igd( x ) (P-a.e.). 

(2) Assume that condition L3. 1\) holds. Then, any ^-'projection, say Q* , of P on S, if it exists, 
satisfies ip'(q*) belongs to (Q), (the closure of (Q) ) in L\(X ', \Q*\). 



If Q is a finite collection of functions in Li(X,\Q*\), then the vector space (Q) is closed in 
L\[X, \Q*\)- So, from the above Theorem, we can state the following Corollary: 



Corollary 3.5. LetQ := {tx, 9i, ■ ■ ■ , gi} be a finite collection of measurable functions on X . Then 
(1) and (2) below hold. 

(1) Let Q* be some measure in S D domxj). A sufficient condition, for Q* to be a 4> -projection 
of P on S , is that there exists some constant c € such that 

1 

-(x) J = c + ^2 °i9i(x) (P - a.e.). 

i=l 

(2) Assume that condition \3.1\) holds. Then any ^-projection, say Q* , of P on S, if it exists, 
satisfies 



, (dQ* 
* {-dP 



there exists some constant c G such that 
V ' \dP~^ X ') =Co + ^l c ^ x ) (\Q*\~a.e.). 



It should be noticed that the preceding Theorem and Corollary do not provide a definite descrip- 
tion of the projected measure; indeed, it does not give any information on the support of \Q*\ (see 
example 13.11 below) . However, if (p(0) = +00 (which holds for example for the K L m -divergence), 
then any (^-projection Q* of P on some set Q, if it exists, has obviously the same support as P 
when (j){^l,P) is finite. Furthermore, we prove in the following Lemma that if ^'(0) = —00 (which 
holds for instance in the case of KL, KL m and Hellinger divergences), then any ^-projection of 
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P on some convex set fi when it exists has the same support as P. At first, state the following 
Corollary which applies in the -fTL m -divergence case. 

Corollary 3.6. LetQ be defined as in Corollarv \3.5{ Assume that assumption (3. 1)) holds. Suppose 
additionally that <p(0) = +oo, and let Q* be some p.m. in S D domxj). Then Q* is a ^-projection of 
P on S iff there exists some constant c £ M. 1+l such that 

^ (^__(x) j =c + ^Cigi[x) {P-a.e.). 
^ ' i=i 

Lemma 3.7. Assume that condition (3.1)) holds, a v — and <p'(0) = — oo. Let Q be some convex 
set of signed finite measures. If there exists some Qo G On domip such that ^^j- > (P-a.e.), then 
any ^-projection, say Q* , of P on £1, if it exists, has the same support as P, i.e., > (P-a.e.). 

Proof of Lemma I3T71 Let A := {x £ X; q*(x) = 0}. Suppose that P(A) > 0. Since Q and P 
have the same support by assumption, Qo(A) > 0. By (|3.2I) (replacing Q by Q ), Qo(A) > im- 
plies that J <p'(q*)q dP — —oo since J \(p'(q*)q*\ dP < oo. This contradicts (I3.5p . which completes 
the proof. ■ 

We can now state, from the above Theorem, the following Corollary which applies in the case 
of KL, KL m and Hellinger divergences. 

Corollary 3.8. Let Q be defined as in Corollarv \3.5\ Assume that assumption (3. 1)) holds. Suppose 
additionally that a v = and (p'(0) = —oo. If there exists some Qq £ S n dom<p such that > 
(P-a.e.), then the following holds : a p.m. Q* in S D dorruj) is a ^-projection of P on S iff there 
exists some constant c £ such that 

<P' {-^( x ^j =C ° + Y1 °i9i{x) {P - a.e.). 



Remark 3.3. Versions of Theorem)3.JA for sets of p.m. 's, have been proved b mCsiszdn (1975 ) and 
^Sl HM) for the Kukack-Leiblefdivergence, and bu \B.ul2dor\ £M) Ld \Uel aL LdA 
for (j)- divergences between p.m. 's. We prove it in the present context, that is when the set 
S (see (3.8)) ) is a subset of signed finite measures and P is a p.m. using similar techniques. 



Proof of Theorem 13.41 We start by proving (1). If f'{q*) belongs to (Q), then for all Q in S, 

we have J <p'(q*) dQ* — J <p'(q*) dQ which, by the first part of Theorem 13.31 proves that Q* is 
a (^-projection of P on S. Proof of part (2): Since Q* is a signed finite measure, by the Hahn 
decomposition theorem, there exists a partition X = X\ U such that X\,Xi £ B and satisfying 

for all B £ B, such that B C X t we have Q*(B) > 

and 

for all B £ B, such that B C X 2 we have Q*(B) < 0. 

Denote by Q* + and Q*_ respectively the nonnegative variation and the nonpositive variation of Q* 
which are defined, for all B £ B, by 



Q* + (B):=Q*(BnX 1 ) and Q*_(B) := -Q*(B n X 2 ). 
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and 



So, Q* + and Q*_ are nonnegative finite measures, Q — Q* + — Q*_ and the total variation \Q*\ is, 
by definition, the nonnegative measure Q* + + Q*_. Denote by (G)> and by (G)_ respectively the 
orthogonal of (Q) in Li(X, Q+) and in Li(X, Q*_), i.e., the sets defined by 

(G)+ ■= j/i € L QO (X, Q* + ) such that J fhdQ* + = 0, for all / G {0)1 

(G)t ■= |/i € £oo(^,Q-) such that J fh dQ*_ = 0, for all / e (0}| . 
We will prove that the two following assertions hold 

(3.9) for all h G (0)+, we have / <p'(q*)h dQ* + = 
and 

(3.10) for all h € (0)±, we have J <p'(q*)h dQ*_ = 0. 

We prove ()3.9|) by deriving a contradiction: assume that there exists h in (0), such that J ip'(q*)h dQ* + ^ 
0. We then have either (a) ftp'(q*)h dQ* + < or (b) f(p'(q*)h dQ* + > 0. Assume (a). For 
< e < <5@ define the measure Qq by 

(3.11) d Q Q: = (i + e !^)dQ*. 



Then Qq belongs to S, and, following condition (|3.ip . Qo belongs to domcf> by Lemma \3. II Fur- 
thermore, 

<p'(q*) dQ = J <p'(q*) dQ* + e Jj^~ J <P'(q*) h d Q*+ < J <f'(q*) d Q* , 

which contradicts the fact that Q* is a (^-projection of P on S (see part 2 in Theorem l3.3[) . Assume 
(b). Consider —h instead of h. We thus have proved (|3.9[) . The same arguments hold for the proof 

of (|3.10p . Therefore, <p'{q*) belongs to ^(0)^ and to ^(0)^ respectively the orthogonal of 

(0), in Li( X, QX) and th e orthogonal of (G)_ in Li(X, Q*_). By Hahn-Banach Theorem (see e.g. 
Section 2 of iBrezisI (1983)), we have 



((0>i) ± = (0> + and (<0)^)^ = (0)_ 



which are respectively the closure of (0) in L\(X,Q* + ) and the closure of (0) in Li(X \Q*_). This 
implies that ip'(q*) e (G) that is, <p'(q*) belongs to the closure of (0) in L 1 (X,\Q*\). This com- 
pletes the proof of Theorem 13.41 ■ 



Example 3.1. Let X := [0. 1], P be the uniform distribution on [0, 1] and G '■= {l[o i]j Id) where 

d- 

r i r n 

<p(x) = 

and consider the set M defined by 



Id is the identity function. Consider the x+~ divergence associated to the convex function 

\{x-l) 2 if ie[0,oo[ 

+oo if x e] — oo, 0[, 



M := |<5 E M such that J dQ = 1 and J (x - 1/4) dQ(x) = j 



here 8 is defined in the condition J3.lt - 
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We apply the preceding results pertaining to the characterization of the projection of P on M. By 
Theorem \2. 7[ there exists a y^-projection, say Q* + , of P on M. By Theorem \3.4\ there exists two 
real numbers Co and c\ such that 



(3.12) 



dQ* +/ 

-^-(^)l{g*(x)>0} = c + ClX. 



The support of Q* + is different from the support of P ; it is strictly included in [0,1]. Indeed, if the 
support of Q* + is [0,1], then 

dQ* + (x) — (co + cix) dP(x) — (c Q + cix)t[ Q1 ](x) dx. 

Using the fact that Q* + belongs to M, we obtain that cq — 5/2 and C\ = —3. So, Q* + satisfying 
dQ* + {x) = (5/2 — 3x) dP(x) does not belong to domx\ (it is not a p.m.), a contradiction with the 
existence of the projection. This proves that the support of Q* + is strictly included in [0, 1] . 
Consider now the x 2 -divergence, i.e., the divergence associated to the convex function 

x e] — oo, +oo[h> (p(x) = -(x — l) 2 , 

and the set M 1 defined by 

M l := |q G M 1 such that j dQ = 1 and J (x - 1/4) dQ(x) = o| . 

Note that minimizing x 2 (-> P) on M 1 is equivalent to minimizing X+{-i P) on M . Hence, Q*> is the 
X 2 -projection of P on M , it has not the same support as P and \3.1 e J\) is not a definite description 
of the projection. On the other hand, the x 2 -projection, say Q* , of P on M exists, it has the same 
support as P, it is a signed measure and it is characterized by dQ*(x) = (5/2 — 3a;) dP(x). This 
example shows the interest of enhancing M to M . 



4. FENCHEL DUALITY FOR 0- DIVERGENCES 

We re fer t o iFenchell dl949l), iMoreaul dl962ft. lBr0ndstedl (|l964f i. iRockafellarl (|l968h . iRockafellar 
(1974) and Ekeland and Temam ( 19991 ) for the notion of Fenchel duality of general convex functions 
on general vector spaces. We consider this notion for 0-divergences functionals Q i-> (f>(Q, P) viewed 
as convex functions on the vector space of signed finite measures Aij^; we give different versions 
of dual representations of the ^-divergences (see Theorems 14.11 14.31 14.41 and 14.51 below - ). In view 
of Proposition 12.11 we identify the topological dual space of [M.f] tjt] with (F U Bb) and the 
topological dual space of [(J- Li Bb) ; tm] with Aij^. Hence, the Fenchel-Legendre transform (i.e., 
the conjugate) of the convex function Q £ [A^jr; tjP[ m- <fi(Q, P) £ [0, +oo] is defined as follows 



(4.1) 



f£ [(FL>B b );T M }^<t>*(f) 



sup 



/ <K> I r [ ^ ) 'IP 



which is convex and lower semi-continuou£@ w.r.t. the r^-topology, the weak topology induced on 
(J- U B b ) by .\4r. 

By the lower semi-continuity of the convex function Q £ [M.?\ tjf] i-» (j>(Q, P) £ [0, + oo] (see 



Propo siti on 12.21 above), applying the Fenchel duality theory (see e.g. Rockafellar ( 1968t ). Fenchell 
( 19491 ) or Dembo and Zeitounil ( 1998h Lemma 4.5.8), we can state the following result for any cj>- 
divergence. 



Note that the conjugate of a convex function is always l.s.c. w.r.t. the weak topology. 
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Theorem 4.1. The function Q £ [.M_f;7>-] i— > <j){Q,P) £ [0, +oo] is the conjugate of its conjugate 
f £ [(^"U Bb)\TM\ i— > </>*(/) defined by \J^. lty . In other words, the <fi- divergence 4>{Q,P) admits the 
dual representation 

(4.2) <P(Q,P)= sup I [ f dQ -</>•(/)), for all Qe Mr, 

where 4>*(.) is defined by |^.1[). 



We now turn to the calculation of </>*(/) (in particular the equality </>*(/) = / <p*(f) dP), and the 
problems of existence, uniqueness and characterization of a dual optimal solution in (14.21) (i.e., a 
function / £ ( T U Bb) which realizes the supremum in (|4.2|l ) when <fi(Q, P) is finite. 

In the following Proposition, when ip is strictly convex and differentiable, we give the explicit 
form of tf>*(f) for all / £ ( T U B b ) such that Imf C Imp'. 

Proposition 4.2. Assume that <p> is strictly convex and differentiable, and that for all f, g £ 
( J 7 U £>;,) smc/i that Imf C Imp' , the integrals 

(4.3) y |fif| (IP and / (V _1 (/)) dP are finite. 
Then for all f £ ( J- U Sb) smc/i £/ia£ /to/ C Imp' , we have 

(4.4) </>*(/) lS /intte, and </>*(/) = / dP = J [f<p'-\f) - <p (<p'~\f))] dP. 

Proof of Proposition [472] For all / in (F UBb), define the mapping G/ : A4jr — > ] — oo, +oo] by 

G f (Q) :=^(Q,P)- J fdQ, 

from which <j>*{f) = — infg eJ vi^. Gf(Q). The function G/(.) is strictly convex. Its domain is 

domG/ := {Q £ A^jr such that Gf(Q) < +00} . 

Denote by Qo := ar gi n fQeA^^ Gf(Q), which belongs to domG/, if it exists. It follows that Qq is 
a.c. w.r.t. P. Since M.? is convex set, the measure Qq (if it exists) is the only measure in domG/ 
such that for any measure R in domG/, 

G' f (Q ,R-Q ) >0, 

where G'^{Qq,R — Qq) is the directional derivative of the function G/ at point Qq in direction 
i? - Qq; see e.g. Theorem 111.31 in 1X3 (|l997l) . Denote r := ^ and g := fjr- By its very 
definition, we have 

G' f (Q ,R-Q ) := lim i {G/(Q + e(i? - Q )) - G/(Q )} 

= lim / I foj ( go + e (r - g )) - filo)} dP - [fd(R- Qq). 
40j e J 

5(e) := — b (<?o + e(r - g )) - ^(tfo)] ■ 

— e 



Define the function 



Convexity of p implies 



9(e) t^'(9o)(?o -r) when e | 0, 
and for all < e < 1 and i? in domG/, we have 

5(e) > 5(1) = ~ fa(r) - ^(g )) € ii(^,P). 
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So, applying the monotone convergence theorem, we obtain 

G' f (Q ,R-Q ) = J (<p'(q )-f) d(R-Q o )>0. 

Therefore, under assumption (|4.3[) . for any function / in (J- U Bb) such that Im/ C lunp', the 
measure Qo exists and it is given by dQo = ip' (/) dP. It follows that 

(4-5) 0*(/) = J V*(f)dP = J [f^~\f)-v{^~\f))} dP. ■ 

Remark 4.1. If the convex function Q £ M.j= n- </>(Q, P) £ [0, +oo] is proper, i.e., 

(4.6) i/iere exists at least one measure Qo in A4j^(P) such that <j)(Qo,P) is finite, 

then the integral J <p*(f) dP is well defined for all f £ (FUBb). Indeed, for all f £ (J-U Bb) and 
for all x £ X , by Fenchel's inequality, we have 

(4-7) ^ { f [x)) >f {x) d ^l {x) - v {^ {x) 

The RHS term belong to L\{X,P) by assumption f^.ffi ). Hence, the integral J <p*(f) dP is well 
defined. Moreover, we have —oo < J ip*(f) dP < +oo for all f £ ( J- U Bb) ■ Hence, from Theorem 
71 we can state the following result. 



Theorem 4.3. Assume that ip is differentiate. Then, for all Q £ Aijr such that cj)(Q, P) is finite 
and ip' (^jp^j belongs to (J-U Bb) , the (j>-divergence <j)(Q,P) admits the dual representation 



(4.8) 



0(Q, P) = sup ( / fdQ- f dP } , 



and the function f := ip' (^jp^ is a dual optimal solution. Furthermore, if ip is essentially smooth, 
then f is the unique dual optimal solution (P-a.e.). 

Proof of Theorem gH Let Q £ Mr such that <f>(Q,P) is finite. Then, the integral J ip*(f) dP 
is well defined for all f £ (F U Bb); see Remark |4~T1 Furthermore, using (|4.7|) for all Q £ M?{P), 
we can see that <f>*(f) < J </?*(/) dP for all f £ (J-U Bb). Hence, using Theorem 14. 11 we can write 

4>(Q,P)= sup If fdQ -<fr*(f)}> sup ( f fdQ- [tp*(f)dP 
fe{ruB b ) U J fe{ruB b ) U J 

On the other hand, by (|1.10j) . we obtain for the function / :— tp' (dQ/dP), 

M -<{§)§-*(§ 

From this, using the fact that the integrals J ip' (^^p^j d\Q\ and J ip ^^p^ dP are finite, by simple 
calculus we obtain the equality J f dQ — J <p*{f) dP — <fi(Q,P), which completes the proof. ■ 



Theorem 14.31 remains valid if we substitute the vector space (J 7 U Bb) by the arbitrary class of 
function J 7 . We state this result in the following Theorem. 
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Theorem 4.4. Assume that tp is differentiable. Let T be an arbitrary class of measurable real 
valued functions on X . Then, for all Q G M.? such that cj>(Q,P) is finite and ip' belongs to 

T , the 4>- divergence 4>(Q, P) admits the dual representation 



(4.9) 



P) = sup jy / dQ- J <p*(f) dP 



and the function f :— ip' ^^p^ is a dual optimal solution. Furthermore, if p is essentially smooth, 
then f is the unique dual optimal solution (P-a.e.). 



Remar k 4. 2. Theorem\4-4i with an approp riate choice of the class T , has been used bu \Kezio~H 
!(2003d ) and \Broniatowski and Kezioi\ I 200 A) to introduce an new common definition of the "mini- 
mum <f>- divergence estimates" in discrete or con tinuous parametr i c mod els. Note that the "plug-in" 
minimum 4>- divergence estimates introduced by\Z iese and VaidA \l98j) in chap t er 10 are defined 
only in discrete parametric models, see also \Lindsai i !99A ) and Morales et al. (199&) . The use 
of the dual representation allows to give a common definition of the minimum <f>-divergence 

estimates in discrete or continuous parametric models. 



Rema rk 4.3. Other versions of dual representations of ' (p - divergences are given in Worwein and Lewis 
jl99i) on Lk(X, P) spaces, in Borwein and Lewis] lil99& ) on compact metric spaces, and in Leonard 
1(200 id ) on Orlicz spaces. See also Rockafellai\ 1(196^1 ) for other convex integral functionals on some 
"decomposable" spaces. 



Under the assumption 

(4.10) / |/| dP is finite for all / e F, 

the convex function Q 6 [.Mjf(P); tjf] i-> <j)(Q,P) G [0, +oo] is proper. Its Fenchel-Legendre 
transform is 



(4.11) 



fe [{FUB b );r M }^ «£*(/) := sup { f f dQ- (p(Q, P) } G (-oo, +oo], 



which is convex and lower semi-continuous. Following Rockafellar (1968) p. 532, let L* := (TUBb) 
and L := _Mjf(P). Then condition (14.101) i mplies that both L * and L are decomposable. Hence, 
we can apply the Corollary of Theorem 2 in iRockafellarl (1968), to obtain the following result: 



Theorem 4.5. Under assumption f^. 10\ ), the convex conjugate function f G (J- U B b ) | — ^ <P*{f) 
defined by {-l-llty is proper, it can be expressed by 



(4.12) <F(f) = J p*(f) dP for allfe (TUB b ), 
and the <fi- divergence <f>{Q, P) admits the dual representation 

(4.13) <KQ,P) = sup { [ fdQ- [ p*(f) dP} , for all Q G Mj?(P). 

fe(ruB b ) U J J 

In particular, the function Q G [Mjr(P); tjt] — > <f>{Q, P) G [0, +oo] is lower semi- continuous. 
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Remark 4.4. The lower semi- continuity property of the function 

Q e [M^(P); r^] -> 0(Q, P) e [0, +00] 

holds from Provosition \2.2\ and Lemma \2.3\ without assuming On the other hand, Theorem 

\4-4\ and \4--5\ are of interest particularly when <f>(Q, P) is finite and the class T contains the function 
f = tp'(dQ/dP). In Theorem \4-4\ condition on Q, i.e., J \ip' (dQ / dP)\ d\Q\ < 00, holds whenever 
4>(Q,P) is finite and ip satisfies condition VS. see Lemma \3.2i However, in Theorem \4-5[ these 
conditions do not inevitably imply assumption \4-Mfy if the class J- contains ip'(dQ/dP). It is the 
case, for example, when <f> = KL, Q is a normal law and P is a Cauchy law. Indeed, KL(Q, P) 
is finite, the assumption J \ \og(dQ/dP)\ dQ < 00 in Theorem \4-4\ holds while the assumption 
J \ log(dQ/dP)\ dP < 00 in Theorem \4-5\ does not. This shows the interest of Provosition [2~B and 
Theorem \4-4\ 



5. Applications to the minimization of ^-divergences on sets of signed finite 
measures satisfying linear constraints 

In this section we apply the results of the sections 2, 3 and 4 to the optimization problem 



inf cj)(Q,P) 

QGMg 



where M g is defined in (|1.13p . 



Under different assumptions, we obtain the dual equality inf (|I.I4[) = sup (|I.I5|) and results about 
the problems of existence, uniqueness and characterization of the dual optimal solution and the 
^-projections of P on the set M g . 

We state our results under the following assumptions: 

(5.1) the convex function ip is differentiable; 

(5.2) there exists at least one ^-projection Q* of P on M g with the same support as P. 

Theorem 5.1. Assume that conditions \3.1\) . 15.1}) and i f 5. 2\) hold. Then 
(1) there exists A € such that 



(2) the equality 



inf (/){Q,P)= sup <X - ip*(\ T g(x)) dP(x) 

? eM 9 AGRW I Jx 



holds, and X is a dual optimal solution. Furthermore, if the function ip is essentially 
smooth, then the dual optimal solution A is unique. 

Remark 5.1. Under assumptions of Theorem 1 5. 1[ the ^-projection of P on M g is characterized 
without supposing that A is an interior point of domcf>* . Furthermore, the dual equality holds and 
the dual optimal solution is attained. Sufficient conditions for assumption i5.2\) are given in Corol- 
lary \5.2\ and Proposition \5.3\ below. 
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Proof of Theorem 15.11 Under assumptions (|3.1|) and (|5.ip . part (1) is a direct consequence of 
Theorem 13.41 part (2). We prove now part (2). We have infQ g M 9 4>(Q, P) — 4>{Q* ,P) since Q* is a 
0-projection of P on M g . Now, by Theorem 14. 4[ choosing the class of measurable functions 

J={i6Z4 A T .g(x) such that A G R 1+l } , 

we can write 

cfr(Q*,P)= sup (aq-/ ^(X T g(x)) dP(x)\, 
and from it we deduct that A is a dual optimal solution by the same Theorem. ■ 



Corollary 5.2. Assume that <p is differentiable and strictly convex. If there exists some A € 
such that 

(5.3) J ip (y _1 (\ T g{x)^ dP < oo and J g 7 V" 1 (j J ' g{x)^j dP{x) = (1,0, . . . ,0) T , 



the 



(1) i/ie measure Q* defined by dQ*(x) = <fi' 1 ^A g(x)j dP(x) is the unique cb-projection of 
P on M g . 

(2) the equality 



inf <f>(Q, P) = sup <\o- V* (\ T g(x)) dP(x) 

Q^ M s AGR1 + 1 I JX 

holds, and A is a dual optimal solution. Furthermore, if the function ip is essentially 
smooth, then the dual optimal solution A is unique. 

In particular, i5.3\) holds if there exists a dual optimal solution A which is an interior point of 
dom4>* := Ae such that / \ip* (\ T g{x)) \ dP{x) is finite 



Proof of Corollary 15.21 (1) Apply Theorem 13.41 part (1). (2) the proof is the same as that of 
part (2) of Theorem O ■ 



Remark 5.2. Note that, if ip is differentiable and strictly convex, then under assumption \3.1\) . 
conditions H5.2\) and h5.S\) are equivalent; see Theorem \3.J\ part (I) and (2). 
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In the the following Proposition we give other sufficient conditions for assumption (|5.2p . The 
conditions are 



(5.4) 4>(M g ,P) <oo; 

(5.5) lim ^- r- = +oo; 
|a;|->oo |x| 

(5.6) for every a > 0, and all i = 1, . . . , I, J (p* (a|ff»|) dP < oo; 

(5.7) there exists numbers 1 < r, k < +oo such that r _1 + k^ 1 = 1, 

lim i — pp- > 0, and for all? = 1, . . . , I, \\gi\\ k < oo; 

|a:|->ao \x\ 

(5.8) the functions gi, . . . , gi belong to L OQ (X,P); 

(5.9) ip(0) = +oo; 

(5.10) a v = and </(0) = -oo; 

(5.11) there exits some Qq € M n dom0 such that —pj^- > (P — a.e.). 



Proposition 5.3. 

(1) Under assumptions {5.4% H5.5\) , \5.6\) and \5.9\) , condition \5. 2\) holds. 

(2) Condition {5.2(1 holds also under assumptions {5.4% {5. 7| ) and {5.9]) . 

(3) Condition \5.'0( holds also if, in part (1), {5. 6|) is replaced by {5. 8]) or/and if condition 

is replaced by [ (f?J|) . (f3T7Z7j) and (fXTTI) ]. 

(4) Condition {5.2}) holds also if, in part (2), condition {5.9\) is replaced by [ (|5.i[) . (|5. and 

JE22J]. 

Proof of Proposition[5T3](i) Since M ff is closed in [.Mjr; tjt] (choosing the class JF = {gi, . . . , gi}), 
we can then apply Theorem 12.61 to deduce that there exists at least one </>-projection of P on M g . 
Condition (|5.9p implies that Q* has the same support as P. (2) We can apply Theorem 12.71 (3) 
Under assumption (|5.8|) . the set M g is closed in r-topology. Hence, we can apply Theorem 12.51 to 
deduce that there exists at least one (^-projection of P on M g . Conditions (|5.ip . (|5. 10[) and (|5.11l) 
imply that Q* has the same support as P (see Lemma |3~7|) . ■ 
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